[Speculative Decoding] fix mtp stop_seqs and limit thinking bugs#7166
[Speculative Decoding] fix mtp stop_seqs and limit thinking bugs#7166lonelygsh wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
|
guanshihui] seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
ba88df0 to
0f4325c
Compare
0f4325c to
41a8185
Compare
41a8185 to
8dea198
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7166 +/- ##
==========================================
Coverage ? 73.62%
==========================================
Files ? 383
Lines ? 53513
Branches ? 8378
==========================================
Hits ? 39401
Misses ? 11361
Partials ? 2751
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8dea198 to
ae2f9f4
Compare
52711f9 to
b37c463
Compare
b37c463 to
dd2326a
Compare
dd2326a to
4ab41f1
Compare
4ab41f1 to
a0be6ee
Compare
…_stop_value kernels - speculate_limit_thinking_content_length: update current_base_step to step_idx+1 (step_idx now records history count before current round); remove incorrect step_idx decrement on accept_num truncation; mark step_idx param as const. - speculate_set_stop_value_multi_seqs: fix can_stop gate to use step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx formula (remove stale -accept_num offset); use <= condition so accept_idx maps directly to the accepted token that ends the stop sequence; fix accept_tokens index (remove -1). - Update unit tests for speculate_set_stop_value_multi_seqs kernel.
a0be6ee to
99b5c45
Compare
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-08
📋 Review 摘要
PR 概述:修复投机解码中两个 kernel 因 step_idx 语义变更引起的索引错误
变更范围:custom_ops/gpu_ops/speculate_decoding/(2 个 CUDA kernel + 1 个测试文件)
影响面 Tag:[Speculative Decoding] [BugFix]
PR 规范检查
PR 规范符合要求:
- ✅ 标题包含
[Speculative Decoding]Tag - ✅ 描述包含 Motivation 和 Modifications
- ✅ 提供了测试修改说明
问题
未发现阻塞性问题。
总体评价
本 PR 修复了因 step_idx 语义从"包含本轮 token"变更为"不包含本轮 token"后导致的索引计算错误。经过代码分析,两个 kernel 的修复逻辑正确:
-
speculate_set_stop_value_multi_seqs.cu:
can_stop判断:step_idx_now >= min_token_limit→step_idx_now + accept_num >= min_token_limit✓- 跳过条件、accept token 路由、索引计算均正确去除了旧语义遗留的
-accept_num偏移 ✓ - 新增边界保护
accept_idx <= accept_num - 2防止越界写入 eos ✓
-
speculate_limit_thinking_content_length.cu:
current_base_step计算修复正确 ✓- 移除了
step_idx回退逻辑,与只读语义一致 ✓ - 参数改为
const int64_t*语义正确 ✓
-
测试覆盖充分:
- 更新了 reference 实现与 CUDA kernel 逻辑一致
- 新增
test_stop_seq_at_last_position_not_detected验证边界行为 - 所有断言符合新语义下的预期输出
经确认,其他使用 step_idx 的模块(speculate_verify.cu、unified_update_model_status.cu 等)语义一致,无需同步修改。
Motivation
本 PR 修复投机解码中 speculate_set_stop_value_multi_seqs 和 speculate_limit_thinking_content_length 两个 kernel 因 step_idx 语义变更引起的索引错误。
Modifications
speculate_set_stop_value_multi_seqs
修复 can_stop 判断:step_idx_now >= min_token_limit → step_idx_now + accept_num >= min_token_limit,因为 step_idx 不再包含本轮 token。
修复跳过条件:step_idx_now - accept_num + accept_idx + 1 < stop_seq_len → step_idx_now + accept_idx + 1 < stop_seq_len,去除旧语义遗留的 -accept_num 偏移。
修复 accept token 路由条件:stop_seq_len - 1 - i < accept_idx → stop_seq_len - 1 - i <= accept_idx,使 accept_idx 直接对应 stop sequence 结束的 accept token 位置,语义更清晰。
修复 accept_tokens 索引:去除多余的 -1 偏移。
修复 pre_ids_idx 计算:step_idx_now - accept_num + accept_idx - offset → step_idx_now + accept_idx - offset,去除旧语义遗留的 - accept_num 偏移。
speculate_limit_thinking_content_length
修复 current_base_step 计算:step_idx[bid] - original_accept_num + 1 → step_idx[bid] + 1,适配新 step_idx 语义。
去除 step_idx 回退逻辑:截断 accept_num 时不再修改 step_idx。
step_idx 参数改为 const:该 kernel 不再写入 step_idx,去除调用侧 const_cast。
测试
更新 test_speculate_set_stop_value_multi_seqs.py,同步适配新 step_idx 语义下的索引和匹配逻辑。
Usage or Command
无新增接口,修复已有逻辑。可通过投机解码推理验证 stop sequences 截断行为及 thinking 长度限制是否正确。
Accuracy Tests
单元测试通过。
Checklist
pre-commitbefore commit.test_speculate_set_stop_value_multi_seqs.py。